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Numerical  Techniques  Based  on 
Radial  Basis  Functions 


Robert  Schaback  and  Holger  Wendland 


Abstract.  Radial  basis  functions  are  tools  for  reconstruction  of  mul- 
tivariate functions  from  scattered  data.  This  includes,  for  instance,  re- 
construction of  surfaces  from  large  sets  of  measurements,  and  solving  par- 
tial differential  equations  by  collocation.  The  resulting  very  large  linear 
N X N systems  require  efficient  techniques  for  their  solution,  preferably 
of  O(N)  or  0(N  log  N)  computational  complexity.  This  contribution  de- 
scribes some  special  lines  of  research  towards  this  future  goal.  Theoretical 
results  are  accompanied  by  numerical  examples,  and  various  open  prob- 
lems are  pointed  out. 


§1.  Introduction 

Many  problems  of  numerical  analysis  take  the  form  of  a generalized  interpo- 
lation in  spaces  of  multivariate  functions  [21].  Due  to  the  Mairhuber-Curtis 
theorem  [12],  such  spaces  cannot  be  fixed  beforehand,  but  must  necessarily 
depend  on  the  given  data.  For  a plain  multivariate  interpolation  problem  on  a 
finite  set  X = {xi, . . . , x/v}  of  pairwise  different  points  in  a domain  fi  C ]Rd, 
there  is  an  easy  possibility  to  generate  a data-dependent  space  via  linear  com- 
binations of  something  that  depends  on  a free  variable  i£llC  ]Rd  and  the 
data  locations  Xj , namely 


Sx,$  ■—  span  {$(x,Xj)  : 1 < j < N}  (1) 

with  a fixed  function  $ : fi  x fi  — > 1R.  The  numerical  generation  of  the  space 
can  be  simplified  considerably  in  the  special  situations 

1)  $(x,  y)  = <p(x  — y ) with  <j>  : JRfi  — > H (translation  invariance), 

2)  $(x, y)  = (j)(\\x  — J/H2)  with  4>  : [0,oo)  -4  ]R  (radiality), 
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and  this  is  how  the  notion  of  a radial  basis  function  came  up.  To  assure  that 
the  interpolation  in  the  points  of  X = {xi, . . . , x/v}  is  uniquely  defined,  the 
matrix 

a'* ) ) i < j, Jt < at  (2) 

must  be  nonsingular.  By  definition,  positive  definite  functions  $ even  make 
this  matrix  symmetric  and  positive  definite,  and  the  positive  definite  radial 
functions 

4>(r)  = exp(— r2)  on  for  all  d (Gaussian), 

<j>(r ) = (1  — r)+(l  + 4 r)  on  lRd, d < 3,  see  [25], 

are  typical  examples.  See  the  review  articles  [4,15,18,16]  for  details.  Though 
the  above  functions  are  scalar,  the  positive  definiteness  property  of  the  second 
depends  on  the  dimension  d of  the  space  containing  x and  y when  forming  the 
scalar  argument  r = \\x  — y^. 

These  two  examples  already  show  that  the  matrix  A^tx  in  (2)  can  be 
sparse  or  have  a strong  off-diagonal  decay.  It  will  be  very  large  if  we  consider 
real-world  problems  with  many  data,  arising  e.g.  from  inverse  engineering  or 
terrain  modelling.  If  the  data  points  are  rather  densely  scattered  over  the 
domain  ft,  the  approximation  power  of  the  space  (1)  will  be  very  good,  but 
the  matrix  in  (2)  will  have  lots  of  similar  rows  and  columns,  yielding  a bad 
condition  number.  The  connection  between  these  phenomena  is  described  in 
some  detail  in  [17]. 

This  paper  concentrates  on  the  numerical  solution  of  large  symmetric  pos- 
itive definite  systems  with  matrices  of  the  form  (2).  There  are  some  additional 
goals: 

1)  O(N)  complexity  of  solving  the  N x N system, 

2)  0(1)  complexity  of  evaluating  an  element  of  (1)  with  N terms  and 

3)  getting  away  with  n « N terms  at  a tolerable  loss  of  accuracy,  when 
interpolating  N data. 

We  shall  describe  greedy  algorithms  as  recently  studied  by  deVore  [5]  and 
Temlyakov  [24],  but  we  shall  omit  multiple  scales  as  proposed  by  Floater  and 
Iske  [9]  and  continued  later  in  [13]  and  [6].  The  techniques  will  be  partially 
based  on  Krylov  subspaces  as  recently  and  independently  studied  by  Faul  and 
Powell  [8]. 


§2.  Splitting  the  Native  Space  Energy 
Dropping  in  the  notation,  we  can  write  functions  from  (1)  in  the  form 

i v 

Sc,x  ■= 

j=l 


(3) 
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with  c e IR^,  X = {ati, . . . , xn}  C fi  C Rd  and  arbitrary  N.  Our  main  tool 
will  be  the  natural  inner-product 


(sc,x,Sd,Yh=  [ ) :=  ^ ^ Cjdk$(xh  yk) 

\ i k / $ i k 

(4) 

introduced  by  a positive  definite  function  $ on  the  union  of  spaces  (1)  for 
arbitrary  data  sets  X.  We  note  in  passing  that  one  can  form  the  corresponding 
Hilbert  space  closure  to  get  the  native  Hilbert  space  of  $,  but  we  refer  the 
reader  to  [11]  and  [19]  for  details.  If  we  fix  the  set  X = {xi, . . . , xx]  and  the 
corresponding  positive  definite  matrix  A$tx,  we  get  an  inner-product 

(c,d)Ai,  x :=cTA$,xd  (5) 

on  IR,V  which  is  familiar  from  the  theory  of  the  conjugate  gradient  method 
based  on  Krylov  subspaces. 

Note  that  the  inner-product  (4)  is  zero  if  the  function  sc>x  vanishes  on  Y 
or  if  Sd,Y  vanishes  on  X.  If  we  assume  Y C X and  make  sure  that  Sd,y  agrees 
with  sC]X  on  Y (e.g.  by  interpolation  on  K),  then 


(sd,Y,s c,x  ~ Sd,y)<t  = 0 

||sd,y||i  + Ikc.x  - Sd,y|||  = ||sc,x||l 

The  second  identity  can  be  viewed  as  a split  of  the  energy  of  the  function  sCjx 
into  the  energy  of  its  interpolant  s<j,y  on  a subset  Y of  X and  the  residual 
sCix  — Sd,y  • We  shall  use  this  energy  split  over  and  over  again. 

§3.  Interpolation  in  Native  Spaces 

At  this  point,  we  digress  a little  and  study  the  interpolation  of  an  arbitrary 
real- valued  function  / on  a domain  f l C Rd.  On  each  fixed  finite  subset 
X C fl  we  can  interpolate  the  values  of  / by  a function  sj)X  from  (1).  Due  to 
(6),  the  energy  ||s/,xlii>  >s  a monotonic  function  of  X with  respect  to  addition 
of  points,  and  it  can  be  easily  evaluated  using  (4).  The  energy  is  bounded 
independent  of  / if  and  only  if  [22]  the  function  / lies  in  the  native  space  of 
4>,  and  in  this  case  we  have  ||/||$  = supx  ||.s/,x ||<j>- 

This  observation  has  some  consequences  for  applications.  If  the  user  does 
not  have  any  a-priori  information  on  /,  the  proper  choice  of  $ is  a problem. 
But  if  the  behaviour  of  ||s/,.xj||  with  respect  to  X is  monitored  for  larger  and 
larger  sets  X , the  user  can  switch  to  a less  smooth  $ if  the  energy  values 
grow  dramatically  with  X.  By  (6)  the  behaviour  of  ||s/,x|||  is  related  to  the 
energy  [|/  — s/,x||$  of  the  residual,  and  further  study  of  this  as  a function  of 
X is  needed,  especially  for  / with  additional  smoothness  properties.  Current 
starting  points  are  in  [20]  and  [10],  and  readers  are  encouraged  to  proceed 
from  there. 


(6) 


362 


R.  Schaback  and  H.  Wendland 


§4.  Iteration  on  Residuals 

We  now  fix  a positive  definite  function  $ and  a function  /o  from  the  native 
space  of  $ or,  at  least,  from  some  space  of  the  form  (1)  for  a rather  large  finite 
set  X.  Our  goal  is  to  reconstruct  /o  by  an  iterative  process.  Note  that  this 
solves  a large  system  with  the  matrix  from  (2),  if  we  start  from  /o  :=  s/:x 
interpolating  some  given  function  / on  X.  In  this  case,  we  do  not  reconstruct 
/ but  rather  its  interpolant.  In  both  cases  we  need  not  worry  about  the 
existence  of  ||/o||$. 

If  we  pick  some  numerically  manageable  finite  set  Yo  and  interpolate  /o 
on  Yo,  we  can  define  f\  :=  /o  — s/ 0iy0  and  proceed  iteratively  by 

/;+ 1 fj  - sfj,Yj , Yj  C SI,  j = 0,1,...  (7) 

using  finite  sets  Yj  that  we  shall  have  to  deal  with  later.  Anyway,  the  energy 
splitting  (6)  yields 

\\sfj,Yj  III  + II/;  ~ sfj,Yj  III  = II/;  III  ,g* 

II/;  - /;+illi  + ll/;+ill!  = ll/;ll!>  1 J 

and  by  summation  we  get  the  telescoping  sum 
k k 

*11*  = Ell/i  - /;+illl  = ll/olll - ll/fc+ill!  < ll/olll  (9) 
;=o  ;= o 

which  must  necessarily  converge  for  k — > oo  even  if  the  choice  of  Yj  is  bad, 
e.g.  Yj  = Yo  for  all  j.  By  standard  Hilbert  space  argumentation  via  Cauchy 
sequences,  the  functions 

k k 

9k+ 1 :=  ^2  Sf,,Yj  : > \fj  ~ /;+ 1)  = /o  “ fk+ 1 

j— 0 j—0 

must  converge  in  the  norm  ||.||$  to  some  element  g in  the  native  space,  but 
we  do  not  want  to  use  this  fact.  Our  goal  is  to  prove  that  the  residuals  /*, 
converge  to  zero,  and  this  would  imply  that  the  functions  gt:  converge  to  /o, 
yielding  the  desired  reconstruction. 

Of  course  we  shall  need  some  additional  assumptions  on  the  sets  Yj  to 
be  successful.  Equations  (8)  and  (9)  suggest  that  we  should  let  Sjhyj  take 
up  as  much  energy  from  fj  as  possible,  and  this  will  be  our  guideline  for  the 
convergence  analysis  in  the  following  sections. 

§5.  Conditions  for  Linear  Convergence 

For  simplicity,  let  us  first  assume  that  s fl  iyj  picks  up  at  least  a fixed  percentage 
of  the  energy  of  fj,  i.e. 

lk/J,Kj|>7ll/;lll  (10) 

with  some  fixed  7 € (0, 1].  This  is  a disguised  hypothesis  on  the  proper  choice 
ofYj,  and  we  have  to  prove  later  how  to  satisfy  this  assumption.  From  (8)  and 
(10)  we  conclude  linear  convergence  of  fj  to  zero  via  ||/j+i|||  < (1  — 7)||/j|||- 
This  proves 
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Theorem  1.  If  the  choice  of  sets  Yj  satisfies  (10),  the  residual  iteration  (7) 
converges  linearly  in  the  native  space  norm,  and  there  is  an  error  bound 

ll/o-ft|ll  = IIAIII<(l-7)‘ll/o|||. 

Assumption  (10)  is  not  easy  to  handle,  because  the  norm  involves  $ 
and  the  value  of  the  right-hand  side  is  not  explicitly  known.  But  in  case  of 
f0  :=  s/tx  for  some  large  finite  set  X C fl  C we  can  restrict  ourselves  to 
sets  Yj  C X,  and  all  functions  fj  will  stay  in  the  finite-dimensional  space  (1). 
On  this  space,  we  can  pick  any  norm  || -||jc>  for  instance  any  discrete  Lp  norm 
of  functions  on  X,  and  make  use  of  the  norm  equivalence 

c||s||x  < INI*  < C||s||x  (11) 

for  all  functions  s from  the  space  (1),  where  the  constants  satisfy  0 < c < C. 
Then  we  can  try  to  get  away  with 

K,r,l&>«ll/ilft  (12) 

with  some  S € (0, 1]  instead  of  (10).  But  since  this  equation  implies  (10)  with 
7 > <5c2 /C2  > 0,  we  get 

Theorem  2.  If  the  choice  of  sets  Yj  satisfies  (12)  for  some  norm  |-j|x  of 
functions  on  X,  the  residual  iteration  (7)  for  reconstruction  of  fo  :=  s/,x 
converges  linearly  in  ||.||x  and  there  is  an  error  bound 

nr  r2  \ fc/2 

\\f-9k\\x<^(l-6^j  ll/llx- 


§6.  Maximizing  Energy  of  Interpolants 

Our  argument  at  the  end  of  Section  4 leads  to  the  problem  of  finding  a finite 
set  Y such  that  the  energy  ||s/,y|||  of  the  interpolant  of  some  function  (or 
residual)  / is  large.  If  fy  € Hly  is  the  vector  of  values  of  / on  Y,  the 
interpolant  s/,y  solves  a system  with  a matrix  Ayy  defined  as  in  (2),  and  the 
energy  is  given  by  the  quadratic  form 

IIs/, i'll!  = /y  V'/y  > ll/Hli^min  (A$,y_1)  = ||/y Ill/Ajnoz  (A^y) 

as  a function  of  / and  Y.  The  maximal  eigenvalue  Xmax  (A$,y)  is  hard  to 
discuss  in  general  (see  Narcowich  and  Ward  [14]  for  results),  and  we  simply 
view  this  quantity  as  a factor  that  depends  on  the  geometry  of  Y and  the 
number  |T|  of  points  in  Y . It  is  an  interesting  open  problem  to  design  some 
Remes-type  algorithm  based  on  exchanges  of  points  to  arrive  at  the  best  choice 
of  a set  Y with  a prescribed  number  of  points. 
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In  the  special  case  |Y|  = 1,  Y = {y}  things  are  easy.  We  get 
IIs/, rill  = f{y)2${y,y)  > f{y)2  min  $(z,  z) 

and  the  maximum  of  /2  will  be  the  best  choice,  especially  if  $ is  translation- 
invariant  or  radial. 

If  we  have  Y C X for  a large  finite  set  X,  we  can  invoke  Courant’s 
minimum-maximum  principle  to  get  Xmnx  (A^y ) < Xmax  (,A<i>tx)  as  an  upper 
bound  that  does  not  depend  on  the  choice  of  Y.  A reasonable  strategy  for 
maximizing  ||s/,y||$  then  is  to  pick  the  |Y|  points  of  X where  / takes  its 
largest  absolute  values. 

In  the  general  situation,  we  have  to  face  the  fact  that  coalescing  points 
are  not  allowed.  A reasonable  strategy  is  to  mimic  the  previous  situation,  i.e. 
to  take  some  large  set  X of  well-distributed  points  and  pick  the  points  of  X 
where  f is  largest  in  absolute  value. 

Another  possible  strategy  is  the  iterative  greedy  collection  of  more  and 
more  points,  forming  a recursive  Cholesky  factorization.  Since  this  possibility 
does  not  seem  to  be  familiar  to  researchers  in  this  area,  we  outline  the  process 
here.  Assume  that  an  interpolant  to  / on  Y is  available  together  with  the 
inverse  B of  A$,y.  Wc  now  want  to  add  another  point  z G Q.  \ Y to  Y, 
thus  enlarging  the  energy  of  the  interpolant.  A naive  choice  of  z is  via  the 
maximum  of  the  absolute  value  of  / — s fty , but  since  we  have 

||«/,vu{.}lll  = IIs/, rill  + 2/(z)  £ f{y)*{z,y)  + f2(z)$(z,z),  (13) 

yey 

the  best  choice  of  z for  fixed  Y is  obtained  by  maximizing  the  right-hand  side 
of  this  equation.  Having  found  z,  one  has  to  update  B in  a suitable  way. 
First,  calculate  the  vector  v € It!y  with  components  $(z,  y),  y eY  and  form 
w :=  Bv.  The  number 

1/a  :=  $(z,z)  — vTw  = <h(z,  z)  — vT Bv 

can  be  shown  to  be  positive,  because  4>  is  positive  definite  and  z does  not 
belong  to  Y , Then  form  u :=  —aw  and  C :=  B + uTu/a.  The  matrix 

(Cr  U ) 

\u  a ) 

then  is  the  inverse  of  A$  yu{z}  needed  for  the  interpolation  on  Y U {z}.  Un- 
fortunately, there  is  no  numerical  experience  in  this  direction  so  far,  especially 
for  the  maximization  of  (13).  A more  careful  calculation  of  the  numerical  com- 
plexity reveals  that  we  have  nothing  else  here  than  a special  formulation  of 
the  partial  Cholesky  algorithm  with  pivoting.  The  choice  of  pivots,  however, 
is  adapted  to  the  setup  of  the  problem  as  an  interpolation. 

Altogether,  this  section  was  intended  to  motivate  readers  to  look  at  the 
problem  of  finding  good  finite  sets  Y for  improvement  of  interpolants. 
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§7.  A Simple  Greedy  Algorithm 

Among  other  things,  the  previous  section  showed  how  to  work  on  subsets 
Y consisting  of  a single  point  y each.  The  best  possible  choice  is  to  take 
the  point  where  / takes  its  maximum  absolute  value,  and  the  interpolant  is 
sf,{y)  = /(y)^(j/)  y)-  We  now  do  this  iteratively  in  the  sense  of  (7)  by 

picking  Yj  :=  {yj}  with  \fj(yj)\  = ||/j||oo-  In  the  “discrete”  case  /o  :=  s/,x  we 
take  the  Chebyshev  norm  on  X , while  in  the  “continuous”  case  /o  :=  / € C(fi) 
with  fl  being  a compact  subset  of  lRrf,  we  take  the  Chebyshev  norm  on  fl. 

Due  to  Theorem  2,  the  discrete  case  leads  to  linear  convergence  towards 
/o  :=  Sftx,  because  (12)  is  satisfied  with  6 = 1.  From  a theoretical  viewpoint, 
this  is  much  better  than  the  non-quantitative  convergence  result  of  Faul  and 
Powell  in  [7].  On  the  other  hand,  there  always  is  the  conjugate  gradient 
method  as  a competitor,  and  it  has  linear  convergence,  too.  But  it  needs  to 
form  matrix-vector  products,  while  our  greedy  algorithm  does  not  even  store 
the  matrix.  It  simply  needs  two  arrays  of  length  |X|  for  the  residuals  and  the 
coefficients,  and  in  each  cycle  it  updates  one  cofficient  and  runs  once  over  the 
residuals  to  update  them  and  find  the  maximum  for  the  next  iteration.  This 
extreme  numerical  simplicity  must  come  at  a price,  and  the  price  is  very  slow 
convergence  after  some  good  progress  in  the  first  few  iterations.  We  report 
on  numerical  experiments  and  adaptive  multiscale  improvements  in  [23],  but 
at  this  point  we  want  to  direct  the  reader’s  attention  to  extend  the  above 
strategy,  e.g.  via  some  suitable  preconditioning. 

Before  we  look  more  closely  at  the  greedy  algorithm  in  the  discrete  case, 
let  us  digress  a little  into  the  continuous  case. 

Theorem  3.  If  $ is  a continuous  translation-invariant  positive  definite  func- 
tion on  a compact  domain  fl  C JRd , the  greedy  algorithm  for  interpolation  of 
a function  f from  the  native  space  of  $ converges  uniformly. 

Proof:  We  have 


and  (9)  shows  that  the  quantities  H/jH^  are  summable.  Consequently,  the 
residuals  fj  converge  uniformly  to  zero  on  the  compact  set  0.  □ 

Corollary  4.  Under  the  assumptions  on  $ as  in  Theorem  3,  the  native  space 
norm  is  expressible  via  a series 


j= 0 j=  0 

where 


fo  f j \fj{yj)\  — ll/j||oo!  fj+i  fj  f3{yj)4>{-  ~yj)/(t>( o)- 


This  result  may  look  complicated  at  first  sight,  but  it  should  be  compared 
to  other  definitions  of  the  native  space  norm,  e.g.  via  Fourier  transforms,  by 
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abstract  completion  of  a pre-Hilbert  space,  or  by  the  supremum  of  the  action 
of  certain  functionals. 

We  do  not  want  to  go  into  details  here  (see  [11],  for  instance),  but  prefer 
to  give  an  illustrative  example.  If  specialized  to  Sobolev  space  Wj  (f2)  with 
k > d/2,  one  has  to  take  (f>(x)  = ||x||2  d^2Kk-d/2 (IMI2)  in  order  to  recover 
Sobolev  space  as  a native  space  for  $(2:, y)  = 0(||x  — y||2).  Now  by  Corollary 
4 one  gets  the  Sobolev  norm  of  a function  as  a series  containing  just  function 
values  on  the  domain  in  a numerically  accessible  way,  using  neither  derivatives 
nor  integration  (but,  of  course,  maximization).  It  should  be  pointed  out  that 
this  technique  provides  some  means  to  assess  the  Sobolev  smoothness  of  a 
given  function  numerically.  Readers  are  encouraged  to  proceed  from  here. 

§8.  Dual  Techniques 

Another  possible  approach  to  solving  a large  N x N system  with  a large 
symmetric  and  positive  definite  coefficient  matrix  x via  smaller  finite  sub- 
problems is  to  define  certain  finite-dimensional  subspaces  Sj  of  the  native 
space  and  to  approximate  the  exact  solution  on  X = {aq, . . . , x/v}  by  approx- 
imation in  the  native  space  norm.  More  precisely,  the  iteration  starts  like  in 
Section  4 with  some  function  f0  and  j :=  0,  and  iterates  like  (7)  according  to 


\\fj-s}\\*  ~ in|  Wfj  ~ slk 

s£&j 

fj+ 1 :=  fj  ~ sj- 


(14) 


By  standard  arguments,  this  iteration  also  satisfies  (8)  and  the  rest  of  Section 
4,  including  the  summability  condition  (9).  Note  that  if  a space  Sj  = Sy3  ,<j> 
has  the  form  (1)  for  some  finite  set  Yj,  then  the  best  approximation  solution 
sj  in  (14)  coincides  with  s/jiyj  and  we  are  back  to  the  method  in  Section  4. 
This  observation  follows  from  Theorem  7 in  Section  9. 

But  there  are  other  possible  choices  for  the  spaces  Sj . In  particular,  Faul 
and  Powell  [7]  pick  certain  one-dimensional  spaces  Sj  = span  {uj}  for  all  j > 
0.  Then  Sj  :=  atjUj  with  cej  :=  (uj , fj)$/ (uj ,Uj)q>  solves  the  approximation 
problem,  and  we  have  the  summability  condition 

k k k 

imi! = = -;-,/■,■)! = ii/oiii  - n/fc+iii!  < n/oiii- 

j= 0 j= 0 ill'll* 

(15) 


§9.  Cyclic  and  Greedy  Dual  Strategies 

In  [7],  Faul  and  Powell  fix  N such  functions  Uj  by  a certain  precalculation  that 
we  shall  discuss  later.  These  functions  are  used  periodically,  i.e.  u j is  used  in 
step  j + kN  for  all  k > 0.  The  periodic  reuse  has  the  advantage  that  one  can 
precalculate  and  store  the  Uj,  if  their  construction  is  somewhat  involved.  We 
start  with  a generalized  and  simplified  version  of  the  convergence  result  in  [7] : 
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Theorem  5.  If  fo  is  in  the  span  of  the  functions  Uj  for  1 < j < N,  then  the 
cyclic  dual  method  of  Fa ul  and  Powell  converges  to  fo- 

Proof:  Since  everything  takes  place  in  a finite-dimensional  space,  and  since 
the  technique  involves  an  energy  split,  the  functions  gj  = fo  — fj  converge  to 
some  function  g in  the  span  of  the  Uj,  and  the  fj  converge  to  fo  — g.  But  as 
(15)  implies 


lim  ( 

k—*oo 


fj+kl*)*  = 0 — ( 


fo  - g)$ 


for  all  j,  the  functions  fo  and  g must  coincide.  □ 

There  are  lots  of  choices  of  uj  that  satisfy  the  hypothesis  of  Theorem  5. 
Conjugate  directions  and  Uj  :=  $(■,  x3)  would  do  the  job.  The  latter  strategy 
coincides  with  the  greedy  method,  if  the  cyclic  choice  is  given  up  in  favour  of 
picking  the  point  where  the  residual  is  maximal  in  absolute  value.  A linear 
convergence  result  is  possible,  if  such  a modification  is  made  in  general: 

Theorem  6.  If  fo  is  in  the  span  of  the  functions  Uj  for  1 < j < N,  then  the 
iteration  (14)  with 

(/i  »«*,-)!  :=max(fj,uk)% 

(16) 

Sj  :=  span  {uk . } 

converges  linearly  to  fo. 

Proof:  We  can  proceed  as  in  Section  5,  using 


IMIx  :=  max(u,uj)| 

i 

for  all  functions  u in  the  span  U of  all  uj.  The  assumption  (12)  is  satisfied  for 
Sj  instead  of  s fj  ty.  due  to 


(fj  i uk3  )$ 

Sj  — , v Uk- 

(Ukj,Ukj)s>  1 

IMx  ^ (sjiufcj)$  = (/jiufcj)$  = Wfj  llx  ’ 

and  the  rest  follows  easily.  □ 

The  inner-products  in  (16)  can  be  evaluated  explicitly,  if  we  work  in  the 
space  (1)  and  use  (4)  and  (5)  in  the  form 


(«c,X,Sd,x)$  = ^CfcSd!x(Zfc)-  (17) 

k 

This  is  particularly  efficient  if  the  functions  Uj  have  only  a small  number  of 
nonzero  coefficients  in  their  representation  of  the  form  Uj  = sc,  iX-  Another 
possibility,  exploiting  the  dual  nature  of  the  algorithm,  is  to  store  and  update 
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(18) 


the  inner-products  (fj,u *,)$  instead  of  the  values  fj{xk).  So  far,  there  is  no 
numerical  experience  with  dual  greedy  algorithms,  unfortunately. 

One  has  a lot  of  leeway  for  picking  suitable  functions  Uj , especially  when 
preconditioning  arguments  come  into  play.  Faul  and  Powell  use  local  Lagrange 
functions  Uj  based  on  relatively  small  subsets  Y,  of  X that  contain  Xj.  In 
particular,  Uj  £ 5^,$  is  defined  by  the  interpolation  conditions 

Uj{Xj)  = 1, 

Uj(xk)  = 0 for  all  xk  eYj,  k / j, 

and  is  expressible  in  the  form  Uj  = s cj  Yj  = sin ,x  with  at  most  \Yj\  nonzero 
coefficients.  The  precalculation  involves  the  solution  of  N systems  with  \Yj  | x 
\Yj  \ matrices  A^Yj  ,and  it  can  be  kept  at  O(N),  if  the  values  \Yj  \ are  bounded 
independent  of  N.  Our  arguments  in  Section  10  will  show  how  this  technique 
can  be  interpreted  as  preconditioning  the  matrix  A$tx-  For  a fixed  accuracy 
to  be  obtained,  and  for  their  special  choice  of  the  sets  Yj , Faul  and  Powell 
then  observe  that  they  need  only  a small  fixed  number  of  cycles  of  the  dual 
algorithm.  Each  cycle  has  N one-dimensional  subproblems,  but  there  are 
techniques  to  keep  each  subproblem  at  a reasonable  complexity,  provided  that 
techniques  like  multipole  expansions  [1]  or  compactly  supported  radial  basis 
functions  [26,  25]  are  used. 

The  selection  of  functions  Uj  is  particularly  good  if  there  are  orthogonality 
or  conjugacy  relations  among  them.  Let  us  look  at  an  inner-product  (uj,uk)$ 
in  case  of  (18),  using  (17)  and  Uj  = sc#,x-  We  get 

(uj,  «*)<*  = 4nuk(xm), 

m:Xm  &Yj 

and  this  quantity  vanishes  if  Yj  C Yk  \ {a:*}. 

This  can  be  seen  as  a motivation  for  choosing 

Xj  £ Yj  £ {xj , Xj+i , . . . (19) 

as  done  by  Faul  and  Powell.  Even  if  the  functions  Uj  are  in  general  not  mutu- 
ally orthogonal  they  are  at  least  linear  independent  as  needed  for  Theorems 
5 and  6.  To  see  this  note  that  the  matrix  C = (c()  which  describes  the  tran- 
sition from  the  basis  ($(■, xj),  1 < j < N)  to  (uj,  1 < j < N)  is  an  upper 
triangle  matrix  and  thus  invertible  if  c\  ^ 0 for  1 < i < N.  This  is  indeed  the 
case  because  of 

0tHKI|!=  C'mui{xm)  = c\. 

Tn:xm^Yj 

We  finish  this  section  by  pointing  out  how  to  make  optimal  use  of  solving 
N systems  with  \Yj  \ x |Y,  | matrices  A^^  for  subsets  Yj  in  a precalculation. 
If  the  full  inverses  of  the  A are  stored  instead  of  the  coefficients  of  Uj, 
one  can  use  the  cyclic  dual  algorithm  with  Sj  :=  Sy,A‘-  The  energy  split  at 
each  step  of  the  algorithm  will  then  be  better  or  equal  to  the  split  obtained 
by  the  dual  cyclic  algorithm  using  a single  Uj  € Sy3a>  like  the  one  used  by 
Faul  and  Powell.  This  is  clear  from  (14),  and  the  following  theorem,  which 
is  well  known  since  the  advent  of  splines,  shows  that  we  end  up  with  a cyclic 
interpolatory  method  of  the  form  (7). 
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Theorem  7.  IfY  is  a Unite  subset  of  ft,  the  approximation  problem 

inf  H/-s||$ 

for  any  f in  the  native  space  of  $ is  solved  by  the  interpolant  s /,y . 

Proof:  Equations  (6)  generalize  via  continuous  transition  to  the  Hilbert  space 
completion  to 

isf,Y,f  - «/, y)«>  = 0 

IIs/, Hi!  + 11/  - s/,y|l!  = ll/ll! 

for  all  / in  the  native  space,  and  the  assertion  follows.  □ 

Consequently,  algorithms  using  interpolants  on  finite  subsets  make  opti- 
mal use  of  the  information  contained  in  the  space  This  links  the  dual 

techniques  back  to  the  interpolatory  methods  in  Section  4.  Numerical  results 
concerning  the  above  cyclic  interpolatory  method,  e.g.  using  the  sets  Yj  of 
Faul  and  Powell,  are  still  missing.  The  progress  must  be  better  due  to  The- 
orem 7,  but  at  the  expense  of  much  more  storage.  And,  an  incorporation 
of  greedy  selections  using  the  good  preconditioning  power  of  the  Faul-Powell 
approach  seems  worth  investigating. 

§10.  Quasi-Interpolation 

There  is  a hidden  link  between  the  Faul-Powell  technique,  preconditioning  of 
and  certain  quasi-interpolation  methods  using  local  Lagrange  functions, 
as  investigated  by  Beatson,  Powell  and  their  coworkers  (see  for  example  [2]). 
If  we  write  the  interpolant  s to  some  function  / in  Lagrange  representation 

N 

Sf,X=Y,f(*i)Vj  (2°) 

/= 1 

with  N Lagrange  basis  functions  Vk  G Sx,$  satisfying  Vj[xk)  = 6jk,  we  can 
relax  (20)  to  a quasi-interpolation  formula 

N 

sf,u,x  / (xj  )uj  (21) 

/= i 

for  any  other  choice  of  functions  uj  that  approximate  the  global  Lagrange 
basis  functions  Vj.  The  choice  (18)  for  certain  subsets  Yj  is  quite  natural, 
because  one  can  often  [3]  observe  that  local  Lagrange  functions  based  on  a set 
Yj  of  neighbouring  points  to  xj  € Yj  decay  quickly  away  from  Xj . Assuming 
(18)  (but  not  (19))  from  now  on,  the  representation  (21)  can  be  rewritten  in 
terms  of  Uj  = sc,  x and  (3)  as 

N N 

Sf,u,X  =J2f{Xj)  ci$(;Xk)  = ]T$(-,Xk)  53  f(XiH' 

j— 1 k-.Xk€Yj  k= 1 i-Xk€Yj 
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The  coefficients  of  the  second  representation  can  be  evaluated  locally,  and 
the  computational  advantage  is  particularly  evident  in  case  of  compactly  sup- 
ported radial  basis  functions. 

We  now  want  to  look  at  the  quality  of  such  quasi-interpolants  on  the 
discrete  set  X itself.  The  operator  that  maps  the  vector 

f\x  :=  (f(x1),...,f(xN))T  €Rn 

to  Sf]Uix |x  € ]RAr  can  be  written  as  the  matrix  product  A<ptx  ■ C,  where 
C = (cj.)  is  the  nonsymmetric  N x N matrix  with  row  index  k and  column 
index  j containing  the  coefficients  cj.  of  the  Uj  columnwise.  The  operator  that 
generates  the  residuals  on  X then  is  Ex  — • C with  the  N x N identity 

matrix  Ex-  In  case  of  Yj  = X for  all  j we  have  C = A*,*-1,  and  there 
are  good  reasons  to  expect  that  there  are  numerically  interesting  cases  where 
some  matrix  norm  of  Ex  — ■ C is  smaller  than  one.  In  such  cases  one 

can  solve  the  problem  on  X by  successive  quasi-interpolation  via  a Neumann 
series.  In  terms  of  vectors  /J  and  containing  the  values  of  residuals  fj  and 
quasi-interpolants  to  fj  on  X,  we  have  the  linearly  convergent  iteration 

f°  :=/o|x 

:=  A*iX  • Cfj 

fj+ 1 :=  P - sj  = (EN  - • Cy+lf° 

calculating  the  interpolant  to  the  data  of  /o  on  X as  the  sum  over  the  s * . Note 
that  we  cannot  use  the  energy  split  here,  because  we  have  left  the  context  of 
interpolation  and  approximation.  Note  further  that  C acts  as  a (nonsymmet- 
ric!) preconditioner  or  an  approximate  inverse  to  A$tx  ■ 

§11.  Experiments  Concerning  Quasi-Interpolation 

To  calculate  the  norm  of  Ex  — A<$,tx  • C numerically,  we  observe  that  the 
matrix  A<ptx  ■ C has  the  entries  upXi),  where  i is  the  row  index.  Thus  the 
entry  at  ( i,j ) of  Ex  — A$}x  ■ C vanishes  for  x , £ Yj,  and  the  column-sum 
norm  of  Ex  — ■ C can  be  written  as 

max  ^ \upxi)\.  (22) 

Again  it  turns  out  that  the  decay  of  local  Lagrange  functions  is  essential. 

In  case  of  data  on  the  uniform  grid  ( h7L )2,  a radial  basis  function  <f>c 
with  support  in  [0,c],  and  sets  Yj  :=  {y  £ ( hTL )2  : ||xj  — y ||2  < R}  of 

neighbours  to  Xj  within  a radius  R,  the  norm  in  (22)  can  be  evaluated  by 
looking  at  the  local  Lagrange  function  uo  with  respect  to  the  origin  and  the 
set  Yq  :=  {y  £ (hTL)2  : ||j/||2  < li}  of  local  interpolation  points.  Since  both 
Y0  and  the  support  of  <f>  are  bounded,  the  function  uq  is  zero  on  integer  grid 
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Fig.  1.  C = 4,  R = 6,  norm  = 0.48,  and  C = 5,  R = 8,  norm  =0.29. 


N | 

5 

9 

13 

21 

25 

29 

37 

45 

49 

57 

61 

Mo.9 

5 

5 

5 

9 

29 

49 

69 

81 

97 

145 

145 

Mq.\ 

5 

5 

21 

29 

109 

137 

149 

Tab.  1.  Point  numbers  Mp  required  for  norm  < p and  N points  in  support  of  <j>. 


points  outside  the  disk  around  zero  with  radius  R + c.  Omitting  the  value 
1 at  the  origin  for  scaling  reasons,  Figure  1 shows  the  behaviour  of  uq  on 
integer  gridpoints  around  the  origin.  We  picked  two  cases  for  the  C2  function 
4>c(r)  '■=  (1  — f/c)+(  1 + 4r/c)  from  [25]  where  the  norm  of  Ex  — A^}x  • c is 
smaller  than  one,  and  the  corresponding  numbers  of  local  interpolation  points 
in  y0  are  113  and  197,  respectively. 

For  applications,  it  is  necessary  to  know  how  large  R must  be  for  fixed 
c and  h in  order  to  make  the  norm  of  Ex  — ■ c smaller  than  0.9  or 

0.1,  say.  Since  R and  c scale  with  h,  the  numbers  M and  N of  points  in  Yq 
and  the  support  of  cf>  depend  on  R/h  and  C/h,  respectively.  Given  a support 
radius  c and  a maximal  meshwidth  h such  that  the  support  of  <f>c  contains 
N = 1,5, 9, 13, . . . points,  we  provide  in  Table  1 the  minimal  number  Mp  of 
points  in  Yq  that  are  necessary  to  keep  the  norm  of  Ex  — A$tx  • c below  p. 
Another  way  of  reading  Table  1 is  that  if  the  matrix  A^tx  for  interpolation 
by  4>(:r,  y)  :=  cj) c( ||as  — 3/ 1 1 2)  on  a regular  grid  has  bandwidth  N , then  it  has 
an  approximate  inverse  with  bandwidth  Mp  that  leads  to  a residual  matrix 
of  norm  p.  The  quasi-interpolant  is  to  be  calculated  via  subproblems  with 
Mp  x Mp  matrices.  It  is  an  interesting  challenge  to  provide  sparse  approximate 
inverses  for  sparse  symmetric  positive  definite  matrices,  because  normally  the 
exact  inverses  will  not  be  sparse. 
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§12.  Conclusions 

At  first  sight,  our  results  on  linear  convergence  look  promising,  but  they  still 
are  too  weak  to  provide  a convergence  rate  that  is  independent  of  N,  since 
no  preconditioning  techniques  are  involved.  Improvements  should  thus  focus 
on  preconditioning,  e.g.  along  the  lines  of  Faul  and  Powell.  Greedy  methods 
for  fixed  <f>  are  limited  to  quick-and-dirty  approximations  with  few  nonzero 
coefficients  and  need  extension  to  multiscale  techniques.  The  adaptive  greedy 
method  in  [23]  is  a first  step,  but  the  results  shown  there  imply  that  it  has  to  be 
stopped  before  it  runs  into  scales  that  are  too  small.  A possible  continuation  at 
small  scales  is  provided  by  quasi-interpolation  as  outlined  here.  A combination 
of  both  techniques  generates  approximations  which  consist  first  of  A'  <<  N 
global  terms  obtained  by  an  adaptive  greedy  method,  followed  by  N local 
terms  constructed  by  quasi-interpolation.  The  overall  complexity  can  thus  be 
kept  at  O(N). 

Acknowledgments.  The  authors  are  grateful  to  Fabien  Hinault  for  detecting 
an  error  in  the  first  version  of  the  paper. 
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